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ABSTRACT 

Continuum fitting is an important aspect of Lya forest science, since errors in the estimated optical 
depths scale with the fractional continuum error. However, traditional methods of estimating con- 
tinua in noisy and moderate-resolution spectra (S/N < 10 pixel -1 and R ~ 2000, e.g. SDSS) such as 
power-law extrapolation or dividing bythe mean spectrum, achieve no better than ~ 15% RMS accu- 
racy. To improve on this, we introduce mean-flux regulated/principal component analysis (MF-PCA) 
continuum fitting. In this technique, PCA fitting is carried out redwards of the quasar Lya line in 
order to provide a prediction for the shape of the Lya forest continuum. The slope and amplitude 
of this continuum prediction is then corrected using external constraints for the Lya forest mean- 
flux. From tests on mock spectra, we find that MF-PCA reduces the errors to 8% RMS in S/N ~ 2 
spectra, and < 5% RMS in spectra with S/N > 5. The residual Fourier power in the continuum is 
decreased by a factor of a few in comparison with dividing by the mean continuum, enabling Lya flux 
power spectrum measurements to be extended to ~ 2x larger scales. Using this new technique, we 
make available continuum fits for 12,069 z > 2.3 Lya forest spectra from SDSS DR7 for use by the 
community. This technique is also applicable to future releases of the ongoing BOSS survey, which is 
obtaining spectra for ~ 150,000 Lya forest spectra at low signal-to-noise (S/N ~ 2). 
Subject headings: intergalactic medium — quasars: emission lines — quasars: absorption lines - 
methods: data analysis 



1. INTRODUCTION 

Over the past 2 decades, the Lyman-a (Lya) for- 
est absorption observed in the spectrum of high-redshift 
quasars has been an important probe of large-scale struc- 
ture and the inter-galactic medium (IGM) at z > 2. 
The fundamental quantity of interest of the Lya for- 
est is its local optical depth to absorption, t(x), at po- 
sition x. This is not a directly observed quantity: it 
is derived from the flux transmission F = e~ T , which 
requires knowledge of the intrinsic quasar continuum 
C(Arest) hi order to be extracted from the observed flux, 
F = /(A obs )/C(A rcs t), where /(A obs ) is the observed flux 
and A rcs t = A bs/(1 + ^qso) is the restframe wavelength 
of the quasar at redshift z = zqso- The error in the 
measured optical depth, St, scales roughly with the frac- 
tional continuum error, St ~ SC/C, which means that 
accurate continuum fitting is important in the optically 
thin Lya forest, where r < 1. Therefore, accurate esti- 
mates of the underlying quasar continuum are crucial in 
order to take full advantage of modern Lya forest data 
sets. 

While virtually all aspects of Lya forest science are 
dependent on the continuum determination, some are 
more sensitive than others. For example, the Lya flux 
probability distribution, which is used to constrain the 
IGM temperature-density relat ion (TDR) , is notoriously 
sensitive to continuum errors: iLed (j 2 1 lh have recently 
found that a 2% systematic error in the continuum esti- 
mation can double the errors in the TDR even with high 
signal-to-noise data. On the other hand, measurements 
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of the 1-dimensional flux power spectrum Pp{k) and 
other 2-point statistics (e. g. threshold clustering func- 
tions, iLee fc Spergell 1201 lh are affected by the Fourier 
power introduced by intrinsic quasar emission lines in 
the Lya forest region. For example, the unaccounted 
continuum variance in the quasar continuum has lim- 
ited the measurement of the 1-dimensional Lya forest 
flux power spectrum ((McDonal d et al. l 120 06) to comov- 
ing scales of r < 40/i _1 Mpc at z = 2.5. Ongoing at- 
tempts to measure the baryon acoustic oscillation (BAO) 
feature in the Lya forest using transverse correlations 
acros s lines-of-sight are less sensit ive to continuum er- 
rors ([McDonald fc Eisensteinl 120071) ; however, large con- 
tinuum errors will still degrade the significance of the 
BAO signal measured in this fashion. 

Unfortunately, accurate quasar continuum fitting is 
a non-trivial problem. At redshifts (z > 2) in which 
the Lya forest becomes accessible to ground-based op- 
tical telescopes, the high absorber line-density makes 
it challenging to identify the intrinsic quasar contin- 
uum. With high-resolution and high signal-to-noise 
(S/N) quasar spectra obtained from large telescopes, 
continua are usually fitted using some form of spline- 
fitting to the observed transmission peaks of the for- 
est — if one believes that the transmission peaks truly 
reach the quasar continuum at a given redshift (but see 
iFaucher-Giguere et al.l 120081: ILeel 120111) . These direct- 
fitting methods cannot, in general, be applied to large 
data sets such as the Sloan Digital Sky Survey, as the 
modest resolution and low S/N make it impossible to di- 
rectly fit the Lya forest except for the very highest S/N 
subsamples — and even then steps need to be taken to ac- 
count for the degradation of transmission peaks f rom the 
lower resolution (see, e.g.. iDaH'Aglio et al.ll2009h . More- 
over, direct fitting techniques usually require significant 
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human intervention and are often very time-consuming, 
precluding their application to the ~ 10 Lya forest 
sightlines in SDSS. 

Noisy Lya forest data usually require some form of 
extrapolation from the relatively unabsorbed spectrum 
bluewardf0 of the quasar's Lya A1216 broad emission 
line. The simplest way to do this is to fit a power-law 
/„ oc v a to spectral regions uncontaminated by quasar 
emission lines redwards of Lya, or even to eschew fit- 
ting individual spectra and extrapolate a mean power- 
law from A ra st > 1216A, using powe r-law values from the 
literature ()Vanden Berk et"aLll200lL from, e.g.,). 

There are two major issues with power-law estimation 
of Lya forest continua. Firstly, there is a break in the 
underlying quasar power-la w at A res t ~ 1 2 00 A — 1300 A. 
This was first identified by iZheng et al.l (|1997|) from a 
study of low-redshift (zqso ~ 1) quasars observed in the 
ultra-violet, in which the Lya forest continuum could 
be clearly identified due to the low Lya line density at 
those epochs. A subsequent study bv lTelfer et al.l (2002) 
found mean power-law indices of («nuv) = —0.69 and 
(aEUv) = —1-76 redwards and bluewards of A res t ~ 
1200A, respectively. This implies that a naive power-law 
extrapolation from A ros t > 1216 A would underestimate 
the true Lya fores t continuum by ~ 10%. Furthermore, 
iTelfer et al.l ((2002) found a large scatter in aNuv and 
Qeuv from the individual quasars in their sample, with 
no correlation between the two; this increases the error 
from power-la w extrapolation in indiv idual Lya forest 
spectr a. While iDesiacaues et al.l (|2007l ) and IParis et al.l 
(12011 have discussed this EUV-NUV power-law break 
in the context of Lya forest continuum estimation, it is 
often ignored in Lya forest analyses. 

Secondly, the Lya forest 'continuum' (usually defined 
around A lcs t ~ 1040A — 1180 A) includes weak emis- 
sion lines such as Fe II A1071 and Fe II/Fe III A1123, 
although the exact identifications vary from author to 
author. These emission lines can cause deviations of 
up to ~ 10% from a flat continuum. It is possible to 
take these features i nto a ccount on average: for exam- 
ple, iBernardi et al.l (|2003fl modeled them as two Gaus- 
sian functions superposed on top of an underlying power- 
law. However, there is a great diversity in the sh ape and 
equiv alent width of these weak emission lines (jSuzukil 
2006). Therefore, the use of an average continuum shape 
would not account for variations of up to 10% within in- 
dividual quasars due the presence of these emission lines. 

One possible avenue for improved quasar contin- 
uum fits is Princ ipal Component Analysis (PC A). 
ISuzuki et al.l ((20051 ) explored this using a sample of 50 
low-redshift quasars observed in the UV by the Hubble 
Space Telescope (HST), in which the A rcst < 1216A con- 
tinuum can be clearly identified. They concluded that 
while PCA fits to the red-side (A rcst = 1216 - 1600A) 
of individual spectrum gave a good prediction of the 
Lya continuum shape (i.e. the weak emission lines), the 
overall continuum amplitude had ~ 10% errors. This 
is presumably du e to the EUV-NUV power-law break. 
IParis et al.l (|2011[) recently carried out a similar analysis 
on a high-S/N (S/N > 10 per pixel) subsample of the 

3 In this paper, we use the terms 'blue' and 'red' relative to the 
quasar Lya emission line unless otherwise noted 



Sloan Digital Sky Survey (SDSS) quasar sample. They 
found a better prediction accuracy of ~ 5%, possibly due 
to their larger spectral baseline (A res t ~ 1025A — 2000A 
as opposed to A rcs t ~ 1025A — 1600A in the earlier work). 

However, the standard PCA formalism does not take 
pixel noise into account, whereas the majority of quasars 
observed in SPSS h ave low signal-to-noise (S/N < 10) 
iFrancis et all ()1992D have argued that PCA fitting errors 
scale directly with the noise level, which implies that, 
e.g., one can expect no better than ~ 20% continuum 
accuracy in a S/N = 5 spectrum using PCA, even with- 
out taking the power-law break into account. 

For all the reasons outlined above, more accurate 
continuum-fitting methods are sorely needed to take 
full advantage of Lya forest data from SDSS and fu- 
ture spectroscopic surveys. In this paper, we will ex- 
plore a refinement of the PCA technique which we term 
'mean-flux regulated PCA' (MF-PCA). Briefly, we carry 
out least-squares-fitting of PCA templates to the unab- 
sorbed quasar spectrum redwards (A res t > 1216A) of the 
Lya line in order to obtain a prediction for the con- 
tinuum shape, and then use the expected mean flux, 
(F) (z ), to constra i n the amplitude of the fitted contin- 
uum. iTvtler et al.1 (|2004l ) have shown that the dispersion 
in (F) expected from a Az = 0.1 segment of Lya forest 
at z = 2 is a>(Az = 0.1) « 4%. When averaged across 
an entire Lya forest sightline (which spans Az « 0.4 
for a quasar at zqso = 2.5), one expects the continuum 
amplitude to be predicted to ~ 2%. 

This paper is organized as follows: § [2] describes the 
publicly available SDSS quasar sample which will be the 
initial subject of our new technique. § [3] elucidates the 
MF-PCA technique, which is then tested on mock spec- 
tra in § HI We will then discuss the results of our contin- 
uum fitting and future improvements. The continuum- 
fits have been made publicly available and can be down- 
loaded via anonymous FTPq 

2. DATA 

The MF-PCA technique which we develop in this pa- 
per is optimized towards large sets of noisy Lya forest 
data spectra. We will apply this technique to the Sloan 
Digital Sky Survey data, which comprises ~ 10 4 Lya 
forest sightlines at moderate resolution (R m 2000) and 
modest signal-to-noise (S/N ~ few per pixel). 

This section provides an overview of the SDSS Lya 
forest data sample, and also the two sets of quasar tem- 
plates which we will use to fit this data set. 

2.1. SDSS DR7 Lya Forest Sample 

In this paper, we carry out continuum-fitting for 
publicly-available spectra from t he final SDSS Data Re - 
lease 7 (DR7) quasar catalog (jSchneider et al.l [20101. 
which is comprised of 105,783 spectroscopically con- 
firmed quasars observed from the 2.5m SDSS Telescope 
in Apache Point, NM. The spectra cover the observed 
wavelength range A obs = 3800A - 9200A with a spectral 
resolution ofR= A/AA ps 2000. 

From this overall catalog, we select a subsample suit- 
able for Lya forest studies. First, we require that 
some portion of the quasar Lya forest region, A ros t = 
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Redshifts of 12,069 z>2.7j SDSS DR7 QSOs 



S/N of 12,069 z>2.3 SDSS DR7 QSOs 





Fig. 1. — (a) Redshift distribution of the SDSS DR7 quasars fitted in this paper, in Azqso = 0-1 bins. We have selected objects with 
2 QSO > 2.3, which have reasonable coverage of the Lya forest. 53 quasars with zqqo > 5-0 are not shown in this plot. The gaps at 
^QSO ~ 2.7 and zqso ~ 3.5 are where the quasar colors cross the stellar locus, making it difficult to select quasars with these redshifts 
(see, e.g., Richards 2002). (b) Median S/N per pixel in our quasar sample, in the Lya forest (A ros t = 104lA — 1185A; black solid lines) and 
redwards of the quasar Lya emission line ( A ros t = 1225A — 1600A; red dashed lines). The histograms are in bins of AS/N = 0.5. Note 
that the majority of the Lya forest sightlines have S/N < 10 pixel . 



104lA— 1185A, be within the observed wavelength range. 
Since the extreme blue end (near A b s ~ 3800A) of the 
SDSS spectra are known to suffer from spectrophotomet- 
ric problems, we use A bs = 3840A as the lower wave- 
length limit. This sets a minimum quasar redshift of 
2 qso = 2.3. For the quasars that satisfy this redshift 
criterion, we excise the portions of the spectra that lie 
below A bs = 3840A. In addition, broad absorption line 
(BAL) quasars have c ontinua which are difficult to char- 
acterize (although see lAllen et al.|[2011l for a method to 
recover quasar continua from BALs) , therefore we dis- 
card quasars flagged as BALs in the iShen et al.l (|2011[ ) 
value-added quasar catalog. 

There are 13,133 quasars in the DR7 quasar cata- 
log which satisfy the above criteria. We make fur- 
ther quality cuts by discarding 962 spectra which have 
SPPIXMASK = -12 bitmasks set (this signifies issues 
with the fiber; see iStoughton et al.l l2002. for further de- 
tails on the SDSS bitmask system), and 2 spectra where 
the signal-to-noise was too low to normalize the spec- 
tra, leading to negative normalizations. This leaves us 
with a sample of 12,069 spectra, to which we will apply 
the MF-PCA technique. Within individual spectra, we 
mask pixels which have either zero inverse- variance or the 
SPPIXMASK = 16-28 maskbits set. This avoids the 
use of problematic pixels, such as rejected extractions, 
bright sky-lines or bad flats. 

The median signal-to-noise in the sample is S/N = 3.0 
per 69 km s" 1 SDSS pixel within the Lya forest, and 
S/N = 6.2 per pixel in the A rest = 1225A - 1600A wave- 
length region. The redshift and signal-to-nois^l distri- 
butions of our final quasar sample is shown in Figure [TJ 
It is clear that the Lya forest data from DR7 are noisy. 
Most of the sightlines have median S/N < 10 within the 
Lya forest, which is too noisy to be fitted individually 



5 Henceforth, all signal-to-noise values quoted in this paper 
are the median values per 69 km s — 1 pixel, in the range Ai-est — 
1225A — 1600A unless indicated otherwise 



using existing techniques. 

In addition, we need to deal with Damped Lya Ab- 
sorbers (DLAs) within the spectra. These are absorb- 
ing systems with neutral hydrogen column densities of 
Nhi > 2 x 10 20 cm~ 2 which result in complete absorp- 
tion over large portions (Av ~ 10 3 kms^ 1 ) of affected 
sight-lines. Since the MF-PCA technique (§ \$ fits the 
amplitude of the quasar continuum based on the mean- 
flux of the low column-density Lya forest, the excess 
absorption of a DLA within a sightline would bias the 
continuum estimate. 

To correct for this, we use a cata log of 1427 DLAs 
identi fied in the SDSS DR7 spectra bv lNoterdaeme et al.l 
(2009). First, we mask the wavelength regio n corre- 
spond ing to the equivalent width of each DLA (|Drainei 



W^X a 



la A Q 



1/2 



(1) 



where A Q = 1216A is the rest- frame wavelength of the 
hydrogen Lya transition, e is the electron charge, m e 
is the electron mass, c is the speed of light, Nhi is the 
H I column density of the DLA, f a is the Lya oscillator 
strength, and 7 Q is the sum of the Einstein A coefficients 
for the transition. 

However, the damping wings of each DLA extend be- 
yond the equivalent width, providing a small but non- 
negligible excess absorption to the pixels close to the 
DLA. We correct for this by multiplying each pixel in 
the spectrum with exp(r w i ng (AA)), where 



' wing 



(AA) 



0/oAa 



., , f a N H Aa ( ~T\ 

m e c 2 Att \ AA 



(2) 



and AA = A — A Q is the wavelength separation in the 
DLA restframe. 

2.2. Quasar Templates 
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In order to predict the shape of the Lya forest con- 
tinuum, we need a set of template spectra with clearly 
identified continua at wavelengths A res t < 1216A. For 
this purpose, we will use two different sets of quasar 
templates, derived from quasars observed in the Hubble 
S pace Telescop e (HST ), and SDSS itself. 

ISuzuki et all (|2005[ ) derived PCA templates from 50 
quasars that had been observed by the Far Object Spec- 
trograph (FOS) on the Hubble Space Telescope in the 
ultraviolet. At the low-redshifts (0.14 < zq S o < 1.04) of 
these quasars, the line-density of the Lya forest is suffi- 
ciently small that the quasar continuum could be clearly 
identified. This enabled the creation of templates in the 
range A rost = 1025A - 1600A, covering Ly/3 A1025 to 
C IV A1549. 

IParis et al.l ([201 1[) recently carried out a si milar study, 
applying the techniques in ISuzuki et al.l ([2005) to a 
subsample of 78 SDSS DR7 quasars. These zqso ~ 
3 quasars were selected to have full coverage of the 
Lya forest and relatively high signal-to- noise (S/N > 
10 pixel -1 ). The transmission peaks in the Lya forest 
were hand-fitted with a low-order spline function to pro- 
vide a continuum estimate. PCA templates were then 
derived in the spectral range A rest = 1020A - 2000A, 
which included the C III A1906 line. While this process 
might give a biased continuum level due to the low reso- 
lution and S/N of the templates, it should provide a good 
description of the relative shape of the quasar continuum 
which is required for MF-PCA — the mean-flux regula- 
tion process is designed to correct for uncertainties in the 
overall continuum level arising from pure PCA fitting. 

We do not expect the redshift differences between the 
template an d the DR7 quasars to be a significant issue, 
even for the ISuzuki et al.l ([20051 ) quas ars ((zqrq) ~ 0.6). 
This is because various studies (e.g. Ivanden Berk et all 
120041: iFanl 120061) have suggested that there is little red- 
shift evolution of quasar spectra. However, the shape of 
quasar spectra is known to have a significant luminos- 
ity dependence , such as the well-known Baldwin effect 
([Baldwin et al.lll978l ). which is the anti-correlation be- 
tween the strength of the C IV A 1549 emission line and 
the quasar luminosity. It is therefore reasonable to sup- 
posed that there would be a significant difference in the 
spectral shapes represented by two templ ates and the 
overal l SDSS sample. This is because the ISuzuki et al.l 
(2005) sample is comprised of relati yely low-lum i nosity , 
nearby quasars as opposed to the IParis et all ([20111 ) 
quasars, which were selected to have high S/Nand are 
therefore a luminous subsample of the SDSS quasars. 

In order to compare the relative luminosities, we calcu- 
late AL128O1 the intrinsic monochromatic luminosity near 
A ros t = 1280A, for the quasars in our SDSS DR7 sam- 
ple as well as the two template samples. We assume a 
standard ACDM cosmology with h = 0.7, il m = 0.28, 
and fl m + — 1. The respective distributions of 
AL128O is shown in Figure El The SDSS DR7 quasars 
have a ty pical luminos i ty of A£i2 8o ~ 10 46,3 ergs/s , 
while the ISuzuki et all ([2005D and IParis et al.l ([201 If ) 
quasars are about 0.5 dex fainter and brighter, respec- 
tively. However, th e comb ined luminosity d istrib utions 
of the ISuzuki et all ([20051 ) and IParis et al.l ([20111 ) tem- 
plate quasars significantly overlap the full range of SDSS 
DR7 quasars, which justitifes the use of both templates 
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Fig. 2. — Intrinsic luminosity distributio n of quasars from S DSS 
DR7 (12,069 spectra; black solid line). ISuzuki et al.l 12001) (50 
spectra; blue dot-dashed line), and Paris ct al. (2011) (78 spectra; 
red dashed-line), as estimated from ALi280- This histograms have 
bin widths of A log 1 Q(ALi28o) = 0.25, and are normalized such 
that the sum of all the bins in each histogram is unity. 



in this paper. 



3. METHOD 



The mean-flux regulated PCA (MF-PCA) fitting 
method described in this paper is essentially a two step 
process: (1) fitting of the red side (A res t > 1216A) of 
the individual quasar spectra using PCA templates, to 
predict the shape of the weak emission lines in the Lya 
forest continuum. This is followed by (2) constraining 
the amplitude of the predicted Lya forest continuum to 
be consistent with existing measurements of the mean- 
flux evolution of the Lya forest, (F)(z). 

3.1. Least-squares PCA Fitting 

The basic concept of principal component analysis 
(PCA) is that a normalized quasar spectrum, /(A), can 
be represented as 



/(A)«MA)+^c^(A), 



(3) 



where /i(A) is the mean quasar spectrum, £j(A) is the 
jth principal component or 'eigenspectrum', and Cj are 
the weights for an individual quasar. The formalism 
fo r deriving the eigen spectra and wei ghts is described 
in ISuzuki etail ([2001 ) and IParis et^f2TJll . 

The standard PCA formalism for deriving the weights, 
cj, does not take into account spectral noise, which 
renders it unsuitable for noisy SDSS spectra (see Fig- 
ure Ub)- Instead, we first carry out a least-squares fit 
to the red-side of each spectrum using the full A rcs t ~ 
1000A — 1600A eigenspectra as a basis. Due to the cor- 
relation between the weak emission lines within the Lya 
forest and in A rest ~ 1300A- 1500A ([Suzuki et al.|[2005l) . 
we expect this to provide a reasonable prediction for the 
shape of the continuum. 

As described in § 12. 2[ we have two sep arate sets of PCA 
eigens pectra from ISuzuki et al.l ([2005D and IParis et all 
(|2011[) . In principle, one could combine the two sets 
of quasar templates to generate one set of eigenspectra 
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Plate 292, Fiber 406, MJD=51609 (S/N= 3.57) Plate 2755, Fiber 140, MJD=54507 (S/N= 5.39) 
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Plate 915, Fiber 40, MJD = 52443 (S/N= 6.89) Plate 2087, Fiber 171, MJD = 53415 (S/N= 1 0.9 1 ) 
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Fig. 3. — Successful examples of our least-squares PCA fitting method on SDSS quasar spectra with different S/N. In each plot, we show 
the observed flux (orange), pipeline noise (green), Cmiti the PCA fit from the first line-masking iteration (black dashed-line), and final PCA 
fit, CpcA (black solid line). Crosses indicate pixels which have been discarded by our absorption line-masking scheme. The vertical dashed 
lines indicate A res t = f216A in the quasar rcstframc; all fitting is carried out redwards of this wavelength. The median S/N value quoted 
is evaluated redwards of the Lya emission line, and the absolute flux error, \SF\, is defined in Equation[4] Note that the amplitude of the 
Lya fo rest continuum (Arest < 1216A) is not well-fitted by the PCA procedure, and will need to be corrected in the mean-flux regulation 
step (§03) 



which would encompass the diversity of bo th template 
sampl es. However, the template spectra from lParis et al.l 
(|2011[ ) are not available to us at time of writing, therefore 
we will carry out our fitting pro cedure separately fo r the 
two sets of PCA eigenspectra. ISuzuki et alj (|2005f ) had 
found that out of their 10 principal component eigen- 
spectra, only the first 8 components appeared to de- 
scribe physical features in the spectra, while the 9th 
and 10th components seemed to describe mostly noise. 
Therefore, we will use only 8 components from each set 
of eigenspectra for our fits. In addition, for the sake 
of consistency we limit ourselves to the rest-wavelength 
range A res t = 1020A— 1600A of each eigenspectrum even 
though the iParis et all (|2011f ) eigenspectra extend up to 
A rcst = 2000A. 

However, we have found that fitting the SDSS spectra 
with just the PCA weights Cj was insufficient to account 
for the large diversity of the sample. Therefore, we in- 
troduce 2 additional fit parameters: a power-law compo- 
nent, a\, and redshift-correction factor, c z . The power- 
law component, a\, is necessary due to the large range of 



slopes found in the SDSS quasars. Even though the 3rd 
through 5th principal components in the HST eigenspec- 
tra include the spectral slope, they also describe some 
emission-line features — the introduction of a\ as a free 
parameter allows an additional degree of freedom and 
enables a better fit to the emission lines and slope simul- 
taneously. Due to this degeneracy between the slopes 
within the eigenspectra and a>\, we do not interpret the 
latter as the slope of the underlying quasar power-law 
continuum. The power-law parameter also helps account 
for low-order spectro-photometric errors as well as dust 
extinction in the spectrum. 

The redshift-correction factor, c z = A^t Arest. 6 j trans- 
lates the spectrum along the wavelength axis with respect 
to the rest wavelength given by the pipeline redshift, 

A^est 6 ! to a best-fitting rest wavelength, Af* s t- ^ i s re ~ 
quired as the SDSS pipeline redshifts are not completely 
accurate (see, e.g. JHewett fc Wildl 120101 ). However, due 
to the asymmetry and velocity shifting of quasar emission 
lines at different redshifts, we do not necessarily interpret 
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TABLE 1 

Free Parameters in MF-PCA Continuum Fits 



Fit Parameter Description 



/l280 Flux normalization, evaluated at A rost ~ 1280A 

c z Redshift correction factor 

ee\ Power-law exponent 

ci ■ ■ ■ eg PCA coefficients 

<jmf Linear mean-flux regulation coefficient 

6mf Quadratic mean-flux regulation coefficient 



Figure[Sl which plots \SF\ against the red-side signal-to- 
noise per pixel, S/N rcd , for PCA fits to a subset of the 
SDSS spectra as well as the mock spectra described in 

§S1 

Figure [5] provides a useful diagnostic for the quality of 
the PCA fits on the SDSS spectra. We expect the fits to 
the mock spectra (green crosses) to represent the case in 
which the PCA eigenspectra describe the spectra nearly 
perfectly (see § |4j) , therefore they typically have smaller 
values of \SF\ than the real spectra. Clearly, the SDSS 
spectra with large \SF\ are most likely bad fits, but note 
that the presence of metal absorption lines and other ar- 
tifacts in the real data can bias \SF\ to larger values even 
for good fits (we did not mask any lines when calculating 
\SF\, as our metal-masking algorithm is rudimentary and 
sometimes masks legitimate pixels). 

In practice, we carry out the PCA fitting procedure us- 
ing the two different PCA templates described in § 12.2) 
then for each SDSS spectrum we select the fit which 
gives the lower value of \6F\. Fits with \5F\ values under 
the 95th percentile of those from the mock spectra (red 
line in Figure [5] are then automatically considered good 
fits, while the rest are visually inspected and flagged for 
goodness-of-fit on the red-side of the spectrum — ap- 
proximately 90% of the spectra were adequately fit. The 
spectra which are not well-fitted by our procedure consist 
mostly of objects which have strong absorption systems 
at Arest > 1216A, such as metal absorption from DLAs 
and weak BAL quasars. An example of this is shown 
in Fig. [4^,. There are also quasars with unusual spectral 
shapes which a re n ot represented in the template spectra 
described in § 12. 2\ such as quasars with weak emission 
lines (Fig. 0t>). 

The spectra now have been had PCA fits carried out 
on them redwards of Lya, but the predicted continuum, 
Cpca, extends bluewards of Lya (A rost < 1216A). For 
the objects which are well-fitted on the red-side of the 
spectrum, we expect the predicted continua to provide 
a reasonable prediction for the shape of the Lya contin- 
uum bluewards of Lya, but the overall amplitude is un- 
certain due to the EUV-NUV power-law break described 
in the Introduction. We now turn to the next fitting 
step, mean-flux regulation, to constrain the continuum 
amplitude. 

3.2. Mean-Flux Regulation 

In the least-squares PCA fitting step described in the 
previous section, we have hitherto used no information 
bluewards of the quasar Lya emission line due to the 
absorption from the Lya forest. However, since the ab- 
sorption redshift at any point in the Lya forest is known 
(z a bs = A bs/1216A— 1), the average absorption averaged 
over each sightline can be used to constrain the predicted 
continuum. In this section, we will describe the use of 
the mean-flux evolution of the Lya forest, (F)(z), to reg- 
ulate the amplitude and slope of the predicted PCA con- 
tinuum. We refer to this as the 'mean-flux regulation' 
step. 

Using the PCA continuum CpcA(A res t) fitted to the 
observed spectrum, we first extract the Lya forest trans- 
mission F mit (A rcst ) = /(A)/CpcA(A rC st) in the range 
Arcst = 104lA— 1185A. The extractedLya forest is then 
divided into bins, and the mean- flux, F ^ (Abin) , is eval- 



c z as a true redshift correction — it is merely an ad-hoc 
parameter to obtain the best-possible fit to the spectrum. 
The full list of free parameters for our continuum-fitting 
procedure shown in Table [T] (omf and £>mf are free pa- 
rameters for the mean-flux regulation step, described in 



We are now in a position to carry out the fitting proce- 
dure. First, the quasar spectrum is shifted to the quasar 
restframe using the pipeline redshift, and normalized at 
Arcst = 1275 — 1285 A We then use the least-squares fit- 
ting routine MPFIT (|MarkwardtJl2009D to find the best- 
fitting set of parameters, [c z ,a\,Cj], given the spectrum 
and its noise. The initial fits from this procedure is rep- 
resented by the black dashed lines in the examples shown 
in Figure [5] 

However, while the intrinsic quasar spectrum is gener- 
ally well-defined redwards of Lya in the SDSS spectra, 
in many cases intervening metal absorption lines can be 
seen in the spectrum in the A res t ~ 1216 — 1600A wave- 
length. To prevent these absorption features from biasing 
the PCA fitting, we carry out a simple iterative proce- 
dure to mask these absorption lines: using the contin- 
uum, Ci n it, obtained from the initial least-squares fit, we 
mask pixels in which /(A) — Ci n it(A) < — 2.5er(A), where 
/(A) and c(A) are the observed spectrum and pipeline 
noise, respectively. We then make a new PCA fit and 
repeat this process until the fit converges. In Figure [3j 
the final PCA fits, Cpca, are shown as black solid lines 
while the masked pixels are denoted by crosses. 

From Figure [3J we see that the least-squares PCA 
fitting procedure generally works well, even with noisy 
(S/N ~ few) spectra. The fit to the Lya A1216 emission 
line is sometimes imperfect, but unsurprising consider- 
ing that the fitted range (A res t = 1216 — 1600A) only 
takes partial account of the line. Comparing the initial 
(dashed-line) and final (solid-line) fits in Figure [3j we see 
that the red-side metal absorption lines usually have lit- 
tle effect on the fits, but in certain cases (e.g. Figure[3Jb) 
and (c)) absorption line- masking noticeably improves the 
fit. 

We use the absolute flux error to quantify the goodness 
of the PCA fits redwards of Lya: 



fx" 



C PCA (A)-/(A) 
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lSFl = — rJll ' (4) 

where Cpca(A) is the fitted continuum and /(A) is the 
observed spectrum smoothed by a 15-pixel boxcar, to 
avoid biasing \SF\ in noisy spectra. A max = 1600A and 
Amin = 1225 A represents the range over which we calcu- 
late \SF\. 

The distribution of \5F\ in the fitted data is shown in 
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Fig. 4. — Examples of SDSS quasar spectra in which the PCA fitting procedure fails to provide a reasonable fit redwards of the quasar 
Lya line, (a) A strong proximate Lya absorber has decimated the quasar Lya emission line, and its associated N V+C IV have introduced 
broad absorption features to the A rcs t > 1216A fitting region — the current algo rithm is incapable of masking such strong absorption 
features, (b) A weak emission-line quasar. The quasar templates described in S 12.21 do not include spectral shapes such as these. 

extracted mean-flux P^(Abm) and the external mean- 
flux constraint (F)(z). In this mean- flux regulation step, 
the parameters c z ,a\ and Cj fitted to A rcs t > 12 16 A are 
kept fixed. For the mean-flux constraint, we use a dou- 
ble power-law fitt ed from the mean-flux measurement of 
((Paris et al.ll201ll kindly provided by Isabelle Paris): 




Fig. 5. — Dependence of absolute flux error from the red-side 
PCA fits, \SF\, against the S/N per pixel in the range A rcs t = 
1225A - 1600A. This is plotted for random subsets of 2000 SDSS 
spectra (black squares) and 2000 mock spectra described in § [J] 
(green crosses). The red line traces the 95th percentile of |<5.F| in 
the mock spectra; SDSS spectra with |<5F| smaller than this are 
automatically considered good fits, while spectra above this line 
are visually inspected to ensure fit quality. 

uated for each bin, where A b m = [1070 A, lllOA, 1050A] 
are the central rest wavelengths of each bin. 

We now introduce a quadratic fitting function blue- 
wards of a pivot point, A res t = 1280 A, to obtain the 
mean-flux regulated continuum: 



Cmf (A re st) : 



: CpCA( A TO st) 
X(l + OMF Arest 



' ^MF A rcs t) i 



(5) 



where omf and &mf are free parameters for the fit, while 
Arcst = Arest /1 280 A — 1. Note that for the lower redshifts 
( z qso J$ 2.4) in which only a limited portion of the Lya 
forest is accessible, we use only the linear parameter, 
Omf, in order to avoid over- fitting. 

We again use least-squares-fitting to find the values 
of omf and &mf which provide the best fit between the 



(0.0031±0.0012)x 

(1 + z ) 3 - 49±0 - 31 
(0.0011±0.0012)x 

(1+z) 4.21±0.70 ; 



z < 3.2 



z > 3.2 



(6) 



where t cS (z) = (F)(z). 

In principle, the errors in the continuum fit should now 
be at the level of a few percent, arising from some com- 
bination of the large-scale variance in the Lya forest and 
errors in the fitting. In the next section, we will use mock 
spectra to quantify the level of continuum errors in the 
MF-PCA technique. 

4. TESTS ON MOCK SPECTRA 

The errors in the MF-PCA continuum fitting technique 
can be tested by carrying the above procedure on noisy 
mock spectra, and comparing the fitted continuum with 
the 'true' continuum which is known by construction. 
This testing process is also useful to check our algorithm 
for bugs and efficiency. In this section, we will describe 
the process of generating realistic mock spectra, and the 
quantitative results of the MF-PCA technique. We will 
also make a comparison between MF-PCA and the com- 
mon technique of using the mean quasar continuum as 
the Lya forest continum. 

4.1. Generating Mock Spectra 

The first step is to create synthetic quasar spectra from 
PCA eigenspectra, by making Gaussian realizations of 
the PCA we ights, c , , (see Equation [3]) in the manner 
described in iSuzukil ((2006). Note that this is an ap- 
proximation, as the distribution of the weights may not 
be fully Gaussian, but it does generate realistic-looking 
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Fig. 6. — Tests of the MF-PCA continuum prediction procedure on mock spectra seeded with noise from real SDSS spectra. In each plot, 
we show the noisy mock spectrum (orange); pipeline noise (green) used to generate the mock spectrum; Cpca> the least-squares PCA fit to 
Arcst — 1216A — 1600A (black solid line); and OmFi mean-flux regulated continuum fit at A rcs t < 1280A (black dashed- line). The vertical 
dashed line indicates Arest — 1216A in the quasar restframe. (|<5C|) is the RMS continuum-fitting error evaluated over the Lya forest for 
each individual spectrum. 

quasar spectra. In principle, the PCA eigenspectra used 
to generate the mock spectra and those used in the fit- 
ting procedure should be separate but drawn from the 
same distribution. While we do have two sets of PCA 
eigenspectra ( § 12. 2p . they represent quasars with differ- 
ent luminosities. Hence, we do not ex pect to be able use 
eigenspectra from iParis et al.l (120111) to f it mock spec- 
tra generated from the lSuzuki et al.1 ( 20051 ) eigenspectra, 
and vice versa. However, since we only use 8 principal 
components in our PCA fitting step (§ 13. ip , we can gen- 
erate our mock spectra using 10 principal components in 
order to increase the uncertainty in the fitting. Never- 
theless, the tests described in this section will primarily 
apply to the limit in which the PCA eigenspectra are a 
good representation of the fitted quasars, which we have 
argued (§ 12. 2\i is a reasonable assumption. 

Therefore, we generate mock quasar spectra in the 
spectral range A rost = 1020A - 1600A, using 10 princi- 
pal components from the ISuzuki et al.l (|2005l) eigenspec- 
tra. Next, we need to introduce Lya forest absorption 
to the mock spectra. For this, we use the publicly avail- 



able R oadrunner Lya forest simulationtQ of Whit e et al.l 
( 2010). These arc N-body simulations with a box size 
of (750/i _1 Mpc) 3 and a grid scale of 187.5 h^ 1 kpc, in 
which the Lya forest flux was derived using the fluctuat- 
ing Gunn-Peterson approximation. The simulations were 
released in the form of 22,500 Lya forest sightlines per 
box, output at redshifts Zb ox ~ 2.00, 2.25, 2.50, and 2.75. 

For a given mock quasar at redshift zqsO; we select 
the simulation box with the closest redshift, z^, ox . Us- 
ing Equation [6j we then re-normalize the mean-flux of 
the box to (F){z = (1 + z QSO )1100/1216 - 1), i.e. using 
the absorber redshift corresponding to the A rcs t = HOOA 
in the quasar spectrum. We choose to normalize the 
mean-flux across the entire box rather than in individual 
spectra in order to preserve the variance across different 
lines-of-sight, which is a source of error in the MF-PCA 
continuum fitting. A random line-of-sight is selected 
from the set of skewers, and the transmitted flux in each 
pixel, F i: is rescaled to F[ = x (F)(z 3 , hs ^)/(F)(z = 
(1 + zqso)H00/1216 — 1), where z a bs,i is the absorber 



' http : //mwhite . berkeley . edu/BOSS/LyA/RoadRunner / 
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redshift corresponding to the pixel. This introduces red- 
shift evolution of the mean flux, (F)(z), within the indi- 
vidual sightlines which had hitherto had a fixed value of 
(F). 

The simulated Lya forest absorption is added to the 
mock quasar spectrum in A rcst < 1216A, and smoothed 
to the approximate SDSS resolution, R = 2000. Gaus- 
sian noise is then added to the mock spectrum using the 
noise array of a randomly-chosen SDSS quasar spectrum 
with the same zqjsoand S/N. The mock spectra are then 
run through the MF-PCA fitting process described above 
to obtain continuum fits. 

4.2. MF-PCA Continuum- Fitting on Mock Spectra 

In Figure [6l we show several examples of the mock 
spectra and the fitted MF-PCA continua. The first thing 
to note is that mock spectra look realistic. They look 
similar to the real spectra shown in Figure [3j apart from 
the lack of metal absorption redwards of the Lya emis- 
sion line. 

As described in § 13.11 we use the mock spectra as 
a benchmark for the PCA fit quality on the red-side 
(A rcs t > 1216A) of the spectra — we automatically ac- 
cept all fits with absolute flux error, \6F\, less than the 
95th percentile of the \SF\ distribution measured from 
the mocks (Figured]). For the fits with larger \SF\ that 
require visual inspection, we use the mocks as a visual 
guide for what constitutes a good fit. 

It is clear from Figure |5] that the mean-flux regu- 
lated continuum, Cmf, is a corrected version of the 
PCA fit, Cpca- In several cases, the initial PCA 
continuum, Cpca, appeared unphysical (e.g. dipping be- 
low the peaks of the forest). These were rectified by the 
mean- flux regulated fit, Cmf- 

We can place this on a more quantitative footing by 
comparing the fitted continua, Cat, to the 'true' con- 
tinua, Ct rU e, which is known by construction in the mock 
spectra. Wc define the continuum fitting residual, 



Xnt\ \ — Cfit(^rest) 

dG(A re st) = — jt r - 1. 

^truc v^rcst ) 



(7) 



We can then carry out MF-PCA fits on large numbers 
of mock spectra to obtain statistics on the continuum 
fitting errors as a function of S/N and quasar redshift. 
In Figure [7] we show the residuals from fitting 1000 mock 
spectra in two bins of signal-to- noise, S/N, and quasar 
redshift, zqso- The orange lines show the residuals as a 
function of wavelength, <5C(A res t), binned into restframc 
1A bins from a subset of 100 mocks. The dashed lines 
show the 1 — (7 dispersion of the residuals estimated from 
bootstrap resampling at each lA wavelength bin, while 
the dotted lines show the 10th and 90th percentile at 
each wavelength. Figure [71(a) represents one of the worst 
case scenarios — the signal-to-noise (S/N = 2 — 4) is low 
at A ros t > 1216A, making it difficult to obtain a good fit 
for the continuum shape. At the same time, the redshift 
is sufficiently low that the Lya forest occupies the blue 
end A bs ^ 4000A of the SDSS spectrographs where the 
signal-to-noise deteriorates rapidly with decreasing wave- 
length. This causes a large scatter in the flux of Lya for- 
est even when averaged over large segments, introducing 
more errors to the mean flux regulation process. 



At moderate signal-to-noise (S/N = 6 — 10, Fig- 
ure[7Kb)), the situation is significantly better. The 1 — a 
dispersion of the fit residuals are well under 10%, and 
6 — 7% in the central portion of the fitted region. Many 
of the residuals are flat to within a few percent across 
the Lya forest region, indicating that the PCA-fitting 
has successfully accounted for the shape of the Lya con- 
tinuum. About one-tenth of the fitted continua are badly 
fit, with badly-predicted continuum shapes and/or resid- 
uals greater than 10%. 

We also calculate the mean bias, <5C(A res t), of the resid- 
uals in Figure [7] Averaged over ~ 10 3 spectra, the MF- 
PCA technique have a low bias of < 1%, although this 
not include any systematics errors in the (F)(z) mea- 
surement used to constrain the overall continuum levels. 

To quantify the overall fit quality on each mock spec- 
trum, we use the RMS of the continuum residuals evalu- 
ated over A^t = 1041 - 1185A: 
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Figure H] shows the median RMS error from runs of 1000 
mocks as a function of redshift, for 4 different S/N bins. 
At lower redshifts, the RMS error is relatively high for 
the low-S/N spectra because the mean flux regulation is 
affected by the increased noise levels near the blue-end of 
the SDSS spectra at A Q b s ~ 4000A. As the observed Lya 
forest region clears the blue end of the spectra, the RMS 
error decreases to a minimum at zqso ~ 3.0. It then 
rises with redshift at zqso > 3 due to the increasing 
variance in the Lya forest, which adds to the error in 
the mean-flux regulation. At fixed redshift, the median 
RMS decreases with S/N as might be expected. Below 
z qso ~ 3, it drops below 5% RMS for moderate signal- 
to-noise (S/N ~ 5) and asymptotes to ~ 3 - 4% RMS for 
the S/N > 10 spectra. 

To estimate the contribution from various sources of 
error in the MF-PCA fitting, we also carried out the 
mean-flux regulation directly on the simulated Lya for- 
est skewers without introducing a quasar continuum, or 
adding pixel noise. In other words, we directly fit the 
function Eq. [5] to the mean flux evaluated over 3 bins 
in each skewer. The RMS error in the continuum from 
this estimate is shown as the dashed line in Fig. [8] For 
2 qso = 2.3 quasars, the RMS error contribution from 
Lya forest variance is about 1.5% ; this increases to 4% 
at zqso = 4. This suggests that even in the limit of 
high-S/N, errors in the continuum shape from PCA fit- 
ting contributes 1 — 2% to the overall RMS continuum 



4.3. Power-law+Mean Continuum Fitting 

To place the above results in context, we carry out 
another set of continuum fits on the mock spectra, using 
the continuum model 

Cmcan(Arcst) = /l280 x Merest) X ^ 1 280a ) ' ^ 

in which the mean spectrum, /Lt(A res t), is multiplied with 
a power-law, A rC st oc A~" t , and /128O is the flux normal- 
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Fig. 7. — Continuum fitting errors from MF-PCA fitting on mock Lya forest spectra, plotted as a function of quasar restframe wavelength 
for (a) 2 < S/N < 4 and 2.5 < zqso < 2.7; (b) 6 < S/N < 10 and 2.9 < zqso < 3.1. In both cases, 1000 mock spectra were generated and 
continuum-fitted. The grey lines represent a random subset of continuum-fitting errors from 100 fits. We also show the overall bias (solid 
line), the dispersion (dashed line), and the 10th/90th percentiles (dotted line) of the errors as a function of wavelength. Note that Figure (a) 
represents a regime with the worst MF-PCA continua, and is truncated at the lower-limit of the observed spectral range, A b s = 3840A. 
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Fig. 8. — Median RMS continuum fitting error, (|<5C|), from 
fitting 1000 mock spectra, calculated for different redshifts, zqso> 
and signal-to- noise, S/N, on the red-side of the spectrum. The 
rise in the low-S/N values of (|<5C|) at Iow-zqso is due to the 
increase noise levels at the blue end (A obs < 4000A) of the SDSS 
spectra, while the overall increase in (|<5C|) with redshift is due 
to the increase in the variance of the Lyct forest. The dashcd-linc 
shows the RMS fitting error in the absence of continuum structure 
and noise. 

ization at A rcs t = 1280A. Both /1280 and a are deter- 
mined separately for each quasar, with the power-law fit- 
ted to the regions near A rost = 1280A and A rcst = 1450A. 
This model is hig hly similar to that implemented in 
ISlosar et al.1 (I2011 . 

In Figure |H1 we show the continuum residuals from 
C me an, as a function of rest- wavelength in the Lya 
forest region. Comparing this plot with Figure [7[b), 
which show the MF-PCA fitting residuals from the same 
[S/N, zqso] bin, it is clear that MF-PCA dramatically 
reduces the range of fitting errors by a factor of 3. In- 
deed, the Cmean (A r cst) residuals are significantly larger 
than even the worst-case scenario for MF-PCA continua, 
represented in Figure [Tfa) . The significant bias (black 
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Fig. 9. — Same as Figure [T^b), but for the power-law+mean 
spectrum continua, C me an (Equation [ij , fit to 1000 mock spectra. 
Note the larger errors in comparison with the MF-PCA residuals. 

line) of the residuals from C me an(A rC st) is puzzling at 
first glance, as one would expect to recover the mean 
spectrum (and hence no bias) when averaging over large 
numbers of spectra. We suspect this bias most likely 
due to an asymmetry in the d istribution of power-law 
spectral indices in quasars fsee IDesiacques et al.| [2007'). 
which we have not accounted for in our mock spectra. 
However, this does not affect the scatter and shape of 
the continua, which is the present quantity of interest. 
The median RMS error from the continua shown in Fig- 
ure [9] is (SC) Tms — 0.13, which is worse than any of the 
[S/N, zqso] bins evaluated in Figure El 

4.4. Residual Continuum Power 

The RMS continuum fitting error is not the most im- 
portant quantity for studies of the 1-dimensional Lya 
forest flux power spectrum, 



P F {k) = 2ir / S(k)S*(k)dk, 



(10) 
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Fig. 10. — Mean power spectrum of 1000 residuals from con- 
tinuum fitting on mock spectra, calculated using power-law+mean 
continuum fitting Cmcan on S/N = 6—10 spectra (black dot-dashed 
lines), and mean- flux regulated PCA fitting Cmf on spectra with 
S/N = 2-4 (yellow dashed lines) and S/N = 6 - 10 (blue solid 
lines). The upper abscissa shows the wave number in units of co- 
moving distance, evaluated at z = 2.75 and assuming a flat ACDM 
cosmology with h = 0.7 and Q m 0.28. The black vertical dot-dot- 
dot-dashed line indicates the location of the first BAO peak for 
this cosmology. The error bars show the error on the mean esti- 
mated from bootstrap resampling. The red solid line denotes the 
contin uum-limited scal e of k > 0.0014 km _1 s, the smallest k at 
which McDonald ct al. (2006) measured Pp(k). Red arrows indi- 
cate the points at which the Cmf fits reach the same continuum- 
limited power as C me an- The vertical red-dotted and red-dashed 
lines indicate the new continuum-limited scale for the S/N = 2 — 4 
and S/N = 6 - 10 MF-PCA fits, respectively. 

where S = F/(F) — 1, and k = 2-k/I is the Fourier 
wavenumber. Rather, it is the Fourier power from the 
continuum errors which is the troublesome systematic. 
For e xample, in their measure ment of Pf(A;) from SDSS 
data, iMcDonald et al.l (|2006l ) were limited to scales of 
k > 0.0014 km^s, corresponding to comoving distances 
of r < 50 h^ 1 Mpc or A rost < 18A in the quasar restframe 
wavelength. This was due to the increasing influence of 
continuum power at large scales. It is therefore perti- 
nent to investigate the amount of residual Fourier power 
introduced by the various continuum fitting methods. 

In Figure [TU1 we show the mean power spectrum of the 
continuum residuals from 1000 mock spectra, (5C(A res t), 
for the power-law+mean continuum method, C mean , and 
MF-PCA continuum fitting, Cmf, shown for two signal- 
to-noise bins. All the mock spectra had quasar red- 
shifts in the range zqso = 2.9 — 3.1. All the residual 
power-spectra have the same overall shape, with a bump 
at k pa 0.0005 km _1 s corresponding to the weak emis- 
sion lines in the intrinsic quasar spectrum at scales of 
Arest ~ 50A. This is unsurprising, since imperfections in 
the continuum fitting should give similarly-shaped resid- 
uals. At fixed k, the amplitudes follow the same pattern 
we had already discussed above: the residual power from 
MF-PCA fitting is significantly lower than that from 
the power-law+mean continuum fits. Even with noisy 
S/N pa 3 spectra, Cmf continuum fitting reduces the 
residual power by ~ 30% compared to C mea n fitted to 
higher-S/N spectra. 

In their measuremen t of Pp{k) from SDSS data, 
IMcDonald et al.l (|2006l) had used' a mean continuum 
shape for all their spectra, which is similar to C mean 



except that they did not fit fo r the individual power - 
law indices. Using this method, McDon ald et al.l ()2006T > 
found that residual continuum power started becoming 
problematic at scales greater than k — 0.0014 km _1 s, in- 
dicated by the solid red vertical line in Figure [TO] We 
thus estimate the residual power in C mea n at this scale at 
which continuum power interferes with Pp(fc) measure- 
ments. We can then look for the points in the Cmf resid- 
ual power spectra with the same limiting power (red ar- 
rows in Figure IT0| . Note that this assumes that the Lya 
forest power is constant whereas the Lya forest power 
decreases with scale, but the change is gradual. For our 
rough estimates, we can approximate it as constant over 
small logarithmic intervals. 

The corresponding limiting values of k (red dotted 
lines) are significantly smaller than for C mcan - This sug- 
gests that MF-PCA continuum fitting could allow the 
Lya forest flux power spectrum to be measured at larger 
scales than previously possible. For S/N = 6 — 10 spec- 
tra, the lower fc-limit is now k = 0.0007 km _1 s. This 
corresponds to a doubling of the accessible comoving 
scales: r = 27r/fc pa 90 ft. -1 Mpc at z = 2.75 compared 
to r w 45 hr 1 Mpc in the IMcDonald et al.l (|2006f ) study, 
where these distances are calculated assuming a stan- 
dard flat ACDM cosmology with h = 0.7, Q m = 0.28 
and with a w = — 1 cosmological constant. Even for 
noisy (S/N = 2 — 4) spectra, the accessible scale has 
been increased significantly to r pa 65 h~ l Mpc. 

The new continuum-limited scales approach the ~ 
100 ft. -1 Mpc baryon acoustic oscillation (BAO) scale at 
moderate signal-to- noise (S/N > 6). In Fig. QUI the black 
vertical dot-dot-dot-dashed line indicates the wavenum- 
be r of first BAO pe a k, cal culated from the prescription 
in lEisenstein fc Hul (|1998l ). Even though future BAO 
measurements in the Ly a forest are expected to be car - 
ried out in 3-dimensions (McDonald & Eiscnstcin 2007), 
the increase in accessible modes along the lines-of-sight 
will improve the robustness and precision of the measure- 
ments. 

5. RESULTS AND CONCLUSION 
5.1. Public Release of Continua 

We have carried out mean-flux regulated PCA (MF- 
PCA) continuum fitting on 12,069 quasar spectra from 
the SDSS DR7 catalog. The continua in the spectra 
range 1030A < A rost < 1600A have been made publicly 
available, and can be downloaded via anonymous FTFLj. 
The IDL fitting code took ~ 0.5 seconds per spectrum 
(including file input/output) on a single processor core of 
a 3.0 GHz Intel Quad Core desktop with 2 GB of RAM, 
allowing the entire SDSS DR7 Lya forest sample to be 
fitted in about two hours. 

Since we expect the MF-PCA technique to provide a 
good continuum fit only when there is a good PCA fit 
redwards of the quasar Lya line, the fitted spectra have 
been visually inspected to verify the fit quality in the 
A rest = 1216A - 1600A wavelength region. Approxi- 
mately 89% of the spectra had reasonable PCA fits, and 
these have been flagged as such in the publicly avail- 
able continua, although we recommend that users of the 
continua should make their own cuts on the fit quality. 
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Approximately 30% of the spectra were better fitted by 
the low-redshift lSuzuki et al.l (120051) templa t es, wh ile the 
rest were better fitted by the iParis etaTI (|20TTI) tem- 
plates. This is qualitat ively as expected, since there is 
greater overlap by the P aris et al.1 (|2011| ) templates in 
the luminosity distribution of the SDSS quasars. 

From the mock spectra analysis of the fit quality in 
§ |4l we have estimated the continuum fitting error at 
each pixel within the Lya forest, as a function of quasar 
redshift and spectral signal-to-noise. However, the er- 
rors have significant covariances, therefore it would be 
too unwieldy to provide the full error estimates for each 
spectrum although we can provide them upon request. 

It is worth noting that although we have made a choice 
on the mean- flux of the Lya forest in our fits (Eq[6j) , it is 
straightforward for users to rescale each fitted continuum 
to their favorite (F)(z). 

5.2. Empirical Tests of Fit Quality 

While we have studied the performance of MF-PCA 
continuum fitting on mock spectra in § 21 it is difficult 
to empirically constrain the quality of the fits. One 
possibility is to compare a small subset of the data 
with high-resolution, high-S/N spectra of the same ob- 
jects. However, even with high-resolution spectra, it 
is questionable whether there are sufficient transmission 
peaks in the forest to adequately con strain the continuum 
shape; iFaucher-Giguere et al.l ((20081 ) have shown that ac- 
curate fitting of the quasar continuum is difficult beyond 
z w 2.5. Furthermore, most high- resolution spectra are 
obtained from echelle spectrographs with uncertain spec- 
trophotometry, so it would be tricky to directly compare 
quasar sightlines which have been observed in both SDSS 
and high-resolution echelle spectrographs. 

However, it is possible to get a sense of the efficacy 
of our MF-PCA continua by stacking large numbers of 
spectra. This cancels out the Lya forest power from in- 
dividual sightlines, and allows the underlying continuum 
shape to be seen, albeit lowered due to the mean Lya 
absorption. 

Recall that the PC A coefficients, Cj, parametrize the 
shape of the quasar spectra. Therefore, if our continuum- 
fitting technique works, quasars with similar Cj measured 
from A rcs t > 1216A should have similar-looking continua 
within the Lya forest region. Thanks to the large number 
of spectra in our SDSS sample, it is possible to stack 
~ 10 2 spectra with similar values of Cj to recover the 
collective shape of their Lya forest continua. 

We can select subsamples of quasars based on their 
values of o\ = c\j\\ and o-i = C2/A2, where Ai = 7.563 
and A2 = 3.604 are the standard deviations of c\ and 
C2 , respective ly, in the low-redshift HST eigenspectra 
(|Suzukill2006[ ). These two principal components account 
for approximately 80% of the total variance in the low- 
redshift quasar templates. We limit ourselves to spec- 
tra with S/N > 3 pixel -1 , and which have been visually 
inspected to be decent fits redwards of A rcst = 1216A. 
We also select quasars with zqso > 2.6 in order to 
ensure reasonably complete coverage of the Lya forest. 
Within a subsample, each spectrum is first normalized 
near A rcs t = 1280A and rebinned into a common wave- 
length grid with AA rost = lA bins before being stacked. 
The same procedure is carried out on the MF-PCA con- 



tinuum fitted to each spectrum, to obtain a mean MF- 
PCA continuum for the subsample. 

In Fig. [TT] we show 4 subsamples from our SDSS sam- 
ple with different [<7i, 02] with respect to the low-redshift 
iSuzuki et al.l (|2005l ) eigenspectra. Redwards of 1216A, 
we see that the least-squared PCA procedure generally 
does a good job of fitting the emission lines, although 
there are inaccuracies in fitting N V A1240 and Si II 
A1306. Bluewards of 1216A, the stacked spectrum ap- 
pears to have a similar shape to the fitted continua, al- 
though the overall flux level is depressed due to the mean 
Lya absorption. 

We can make a more direct comparison between the 
stacked spectra and the fitted Lya forest continua by 
correcting each observed Lya forest pixel by its mean 
flux (using Equation [5]) before stacking. The mean-flux 
corrected Lya forest is shown as the disembodied red 
line in Fig. [TTJ It is gratifying to see that the stacked 
MF-PCA continua generally agrees well with the stacked 
Lya forest spectra. Our technique can clearly account 
for the diversity in quasar continua: spectra with clear 
emission-line features (Fig [TTa ) and those with smooth 
continua (Fig [TTb ) are well differentiated. Because we 
have corrected each Lya forest pixel by the same mean- 
flux (Eq.|6|) which we have used to carry out the MF-PCA 
fits, we expect the amplitude of the corrected Lya for- 
est stacks, in A rcs t = 1041 A — 1185A, to match those of 
the stacked MF-PCA continua, but the tilt and shape of 
the continuum bears testament to the success of the tech- 
nique. In addition, since the MF-PC A continua shown in 
Fig EH were a subset which used the ISuzuki et al.l (2005) 
quasar templates, this suggests that it was appropriate 
to use low-redshift templates to fit some of the zqso ^ 2 
SDSS spectra. 

5.3. Conclusions 

We have introduced mean-flux regulated PCA (MF- 
PCA) continuum fitting, a new technique for predict- 
ing the Lya forest continuum in low-S/N spectra. In 
tests on mock spectra, we have found that MF-PCA can 
predict the continuum at the 8% RMS level in SDSS 
spectra with S/N ~ 2 at z = 2.5, and < 5% RMS in 
S/N > 5 spectra. This is a significant improvement over 
the ~ 15% RMS continuum errors previously achievable 
in low-S/N spectra. We are making available MF-PCA 
continuum fits for 12,069 Lya forest spectra from the 
SDSS DR7 quasar catalog. The MF-PCA technique also 
significantly reduces the Fourier power from continuum- 
fitting residuals by a factor of a few in comparison with 
dividing by a mean continuum. This will allow a con- 
comitant increase in the accessible scales for Lya forest 
flux power spectrum measurements. 

This improved continuum-fitting accuracy will signif- 
icantly increase the value of low-S/N Lya forest data. 
For example, the ongoing Baryon Oscillations Spectro- 
scopic Survey (BOSS) will obtain Lya forest spectra from 
~ 150,000 quasars at .zqso ^ 2, with the aim of mea- 
suring the baryon acoustic oscillation feature in the Lya 
forest absorption across different quasar sightlines. The 
typical signal-to-noise (S/N ~ 2) of BOSS spectra will be 
even lower than that of SDSS (S/N ~ 4), therefore we 
expect the MF-PCA technique to contribute significantly 
to the utility of the BOSS data. 
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Fig. 11. — Stacked SDSS Lya forest spectra (black) and similarly-stacked MF-PCA continuum fits, plotted for narrow selections of the 
first two PCA eigenvalues, a\ and fX2- The red curve shows the 104lA < A ros t < 1185 Lya forest region of the spectra which have been 
corrected by the mean-flux prior to stacking. The agreement of the MF-PCA stacks with the mean-flux corrected Lya forest stacks show 
that the MF-PCA is doing a good job of predicting the shape of the Lya forest continuum. 
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There are several ways in which the cu rrent work could 
be improved. The iSuzuki et all (|2005| ) PCA templates 
with which we have used to fit some of the spectra were 
derived from a low-redshift quasar sample which may 
not be a perfect descriptor of the SDSS data (although 
the test in § [5] shows that it d oes a reason a ble jo b) . In 
addition, while the zqso ~ 3 Par is et all (|2011[ ) tem- 
plates were indeed obtained from SDSS quasars, they 
used a high-luminosity subset which are not representa- 
tive of the full SDSS luminosity distribution. Further- 
more, the hand-fitting technique which they had used to 
obtain continua from these spectra cannot be used for 
the lower- luminosity (and hence lower-S/N) quasars. 

However, with a large data set such as SDSS or BOSS, 
it is possible to regard the Lya forest absorption within 
individual spectra as a noise term which cancels out with 
sufficiently large numbers of template spectra. This will 
allow new eigenspectra to be generated from the data 
itself, although each individual spectrum will need to be 
corrected by its mean flux before being included in the 
eigenspectrum solution. In the near future, we will work 
on this technique to generate new eigenspectra from the 
BOSS data. 

The other issue with the MF-PCA fitting is that it 
requires an assumed mean-flux, (F)(z) for the Lya forest. 
This is not ideal, as the evolution of the mean-flux is an 
important observable of the Lya forest. This could in 
principle be overcome by solving simultaneously for the 
mean-flux of the Lya forest and the continuum-fitting 
parameters for the individual spectra, using maximum- 
likelihood techniques. This would allow large Lya forest 
data sets to be continuum-fitted and studied in a fully 



self-consistent fashion. 
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